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Abstract 



The predictability of discrete-time processes is studied in a deterministic setting. A family of 
one-step-ahead predictors is suggested for processes of which the energy decays at higher frequen- 
cies. For such processes, the prediction error can be made arbitrarily small. The predictions can be 
robust with respect to the noise contamination at higher frequencies.! 

Index terms — Bandlimited, causal convolution, discrete time systems, harmonic analysis, predic- 
. tion, Szego-Kolmogorov Theorem. 
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I Introduction 

The paper studies pathwise predictability of discrete time processes in a deterministic setting. It is 
well known that certain restrictions on the frequency distribution can ensure additional opportunities 
for prediction and interpolation of the processes. The classical result for continuous time processes is 
the Nyquist-Shannon-Kotelnikov interpolation theorem for the continuous time band-limited processes. 



— ■ These processes are presented in many models including econometrics models (see examples in lUOl ). In 

theory, it is not possible to conclude that a process is band-limited given some finite interval of observa- 
tions. In practice, this conclusion is being made based on historical data for a certain process; this leads 
to models where processes are assumed to be band-limited. Predictability based on sampling and the 
Nyquist-Shannon-Kotelnikov theorem was discussed in lfT4l . ||5l, |[8l. |[T3l . ||6l Q. These references deal 
with the predictability of continuous-time, band-limited stochastic processes, which include stationary 
processes. The predictors obtained in this work were constructed for the setting where the shape of the 
spectral representation is supposed to be known. For discrete time processes, the predictability can also 
be achieved given some properties of spectral representations. For stationary discrete time Gaussian 
processes, Szego-Kolmogorov Theorem ensures that the optimal prediction error is zero if the spectral 
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density (p is such that f^^ log (p (e*"^) du = — oo (see, e.g., ifTTTl . p.68). However, it was unknown how to 
construct a predictor when the shape of the spectral density is unknown. This long standing problem was 
addressed in IS : a predictor for general type band-limited time series was suggested in a deterministic 
setting. This predictor was a modification of the predictor obtained in HI for continuous time processes. 

Unfortunately, there are serious limitations to the practical use of the predictors constructed for the 
band-limited processes. A common argument dismissing the effectiveness of these predictors is that the 
predictors are not robust with respect to small noise contamination. This leads to the conclusion that 
the predictability is an abnormality that disappears with the presence of arbitrarily small noise or some 
incompleteness of historical data. 

This paper addresses these problems again. We consider discrete time processes with some restric- 
tions on the rate of energy decay on the higher frequencies. We establish the predictability of these 
processes, and we suggest new linear predictors represented by causal convolution sums over past times 
representing historical observations (Theorem [T]). The future values of the process are not supposed 
to be calculated precisely but are rather approximated with an error in a prescribed interval that can 
be made arbitrarily small uniformly over a wide class of underlying processes. Similarly to f3l, the 
predictors are given explicitly in the frequency domain. Whereas the results of Q were restricted to 
band-limited discrete-time processes, we now extend the analysis to processes that are not strictly band 
limited. The predictors suggested here are different from the predictors from |i3J. The setting of the 
present paper is similar to the one for continuous time processes from 121, where predictors were sug- 
gested for processes that were not band-limited but were assumed to have an exponential rate of energy 
decay on higher frequencies. 

These results sheds some new light on the predictability conundrum for band-limited processes. 
More precisely, it leads to the conclusion that the band-limited processes still allow robust predictability. 
The feasible predictability is an interval sense only, i.e., it produces an interval that contains the future 
value rather than the exact future value. The error can be made arbitrarily small; however, this would 
require a large enough norm value of the predictor's transfer function. We show that this prediction 
is robust with respect to noise contamination in the following sense. Given the size of the prediction 
error that is associated with a process that is free of noise contamination, the additional prediction error 
that is attributable to noise contamination depends linearly on the product of the noise with a norm 
associated with the transfer function entailed in forming the prediction from preceding values of the 
process (see Section HVl). If the predictor is targeting too small a size of the error for processes without 
noise contamination, the norm of the transfer function increases, and this robustness vanishes. 

The paper is organized in the following manner. In Section |lll we formulate the definitions and the 
main result. In Section |llll we prove the main theorem concerning the predictablility of processes with 
a certain rate of energy decay on higher frequencies. In Section |TVl we discuss the robastness of the 
predictors with respect to noise contamination. Finally, in Section |Vl we summarize our results and 
offer suggestions for further research. 
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II Definitions and main result 



LetD = {zeC: \z\ < 1}, = C\D, and T = {z G C : \z\ = 1}. 

We denote by ir the set of all sequences x = {x{t)} C R, t = 0,±l,ib2, such that \\x\\i^ = 

{j2tZ-oo i^(*)r)^^'" < for ^ ^ ikikoo = ^^Pt < for ^ = 

Let ^+ be the set of all sequences x £ £r such that x{t) = for t = —1, —2, —3, .... 
For X G ^1 or X G we denote by X = Zx the Z-transform 

oo 

X(z) = x{t)z-\ zGC. 

t= — oo 

Respectively, the inverse x = Z^^X is defined as 

x{t) = — I X {e'^)e'^^du;, t = 0, ±1, ±2, .... 

If X G £2, then X\j is defined as an element of L2{T). 

Let be the Hardy space of functions that are holomorphic on D'^ including the point at infinity 
(see, e.g., H ). Note that Z-transform defines a bijection between the sequences from £2 ^^id the 
restrictions (i.e., traces) of the functions from on T. 

Definition 1 Let IC be the class of functions k : £^ — )• R such that k[t) = Ofor t < and such that 
K{.) = Zk£ 

Definition 2 Let y C £r be a class of processes. 

(i) We say that this class is £r-predictable if there exists a sequence {A;m(')}m^i ^ such that 

+ 1) — ^ m — > +00 Mx^y. 

Here Xm{t) = Yll=^oo^mit - s)x{s). 

(ii) We say that the class y is uniformly ir -predictable if for any e > 0, there exists k(-) G IC such 
that 

\\x{t + 1) - x{t)\\i^ < e Vxey. 
Here x{t) = Es=-oo^(^ " s)x{s). 
Let some g > 1 be given. For c > and uj G [— vr, vr], set 

Let X{c) be the class of all sequences x G £2 such that 

ess sup \X (e*") \h{u),c) < +00, (1) 
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where X = Zx. Let X = [Jc>oX{c). 

Note that h{io, c) — +00 as ^ ibvr and that ([B holds for degenerated processes, with X (e*'^) 
approaching zero with sufficient rate of decay as cj — )• ibvr. In particular, the class X includes all band- 
limited processes x such that X (e**^) = for a; ^ [— for some loi £ (0, vr), where X = Zx. 

Theorem 1 Let either r = 2orr = +00. 

(i) The class X is i^-predictable. 

(ii) Let cq > be given, and let U{co) be a class of processes x(-) € X[cq) such that 

ess sup \X (e*"^) /i(a;,co)| < 1 

7r,7r] 

for X = Zx. Then this class U{cq) is uniformly Ir-pf^dictable. 

(Hi) A sequence of predicting kernels that ensures prediction required in (i) and (ii) can be constructed 
as the following. Let fj, > 1 be given. For 7 > 0, set 

a = a(7) = l-7^, V{z) = l-exp( —) , K{z) = zV{z). 

Then the required sequence of kernels that ensures prediction required in (i) and (ii) is k{-) = 
k{-,j) = Z~^K, where 7 = 7i ^ +00. For these kernels, 

\\x{t + I) - x{t)\\(_^. ^ Q as 7 ^- +00 "ix^X. 

Moreover, for any cq > and e > 0, there exists 7 > such that 

\\x{t + l) -x{t)\\i^ <£ VxeZY(co). (2) 

Here x{t) = Es=-oo^(* " s)x{s). 

Note that any particular predictor described in Theorem [T] ensures predictability in an interval sense 
only, i.e., it produces an interval [x(t + 1) — e, x{t + 1) + e] that contains x{t + l) rather than the exact 
value of x(t + 1). However, this e can done arbitrarily via selection of a large enough 7. 

The family of predicting kernels k introduced above represents an extension on the discrete time 
setting of the construction introduced in [2] for continuous time processes with exponential rate of 
decay of energy on higher frequencies. 

Ill Proofs 

In our setting, x{t + 1) is the output of anticausal convolution with the transfer function K{z) = z, i.e., 
x{t + l) = Z-^{KZx){t). 

Let 0(q) = arccos(-a), let D+{a) = {-VL{a),VL{a)), and let a Z:'(a) = [-vr, 7r]\i:>+(a). We 
have that cos(il(a)) + a = 0, cos(a;) + a > for a; G D+(a), and cos(cj) + a < for o; € D{a). 

Note that a = 0(7) — > 1 as 7 — > +00. 
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Lemma 1 (i) V{z) € H°° and K{z) = K{z)V{z) G 
(ii) V{e^'^) I for all uj G (-vr, vr) 7 +00. 
(in) Ifuj e {-Vt{a),Q.{a)) then ReW{uj) > Q and \V (e*") - 1| < 1. 

(iv) For any c > 0, there exists 70 > such that for any 7 > 70 and for V selected with a = 0(7) we 
have Jj^^^.^ \V (e*'^) - l\Ph{u},c)-Pduj < 2 arccos(a)/or any p > 1. 

Proof of Lemma\l\ Clearly, V G H"^, and zV{z) = K{z)V{z) G H°°, since the growth of z is 
being compensated by multiplying with V{p) = — X^^^ k\(z+a)'' - Then statement (i) follows. 
Further, for a; G (— vr, vr), we have that 

1 e-'"^ + a 

7 , „ =7t 



Hence 

„ / 1 \ cosfa;) + a 

Re -7- = -7 , / ^ „ 

If 7 — )■ +00 then a = 0(7) — 1. This implies statements (ii)-(iii). 

Let us prove statement (iv). We consider a large enough 7 such that a = 0(7) > 3/4. For these a 

and u G D{a), we have that l/|e*" + a| < 2/|e^'^ + 1|. Hence 7|Re (l/(e^'^ + «))!< 27/le*'^ + 1| for 
these a and cj. Hence 



exp 



Re- ^ 



< exp 



27 



+ a 

for all G D{a). Let /?(a) = je*^*^"") + By the choice of a = a{j), it follows that 

2m \ -1/2 



Hence 



It follows that 



p(a) = (2 - 2a)"^/2 ^ (^27^) =2^^/27^. 



2jp{a)'-'^ = 27 f2-i/27^y ' = 2i-(i-^)/27i-^ 



27/9(0)^"'? as 7 ^ +00. (3) 

By Q, for any c > 0, there exists 70 > such that 27p(a)^~'^ < c for any 7 > 70, and, therefore, 
27/9(a) < cp(a)9. Moreover, 27/le*'^ + 1| < c/|e*" + l\i for all w G D{a) since l/|e^'^ + 1| > p(a) 
for these co. Hence 

\V{iio) - l\h{uj, c)-^ < 1, \V{ico) - l\Ph{uj, c)-p < 1 
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for all uj € D{a). In addition, the measure of the set D{a) is vr — arccos(— a) + arccos(a) = 
2 arccos(a). This completes the proof of statement (iv) and Lemma[I] □ 

Proof of TheoremUl Let 7 — )• +cxd, and let V, K, K be as defined above. Let k = Z~^K and 

k = Z-^k. For x[-) G X, let X = Zx and 

00 t 
y{t) = Y.k{t- s)x{s) = x{t + 1), y{t) ^ J] ^(t - s)x{s). 

s=t s=— 00 

We have that k = Z^^K is real valued, since k{-) is real valued and K {z) = K (z), K (e""**^) = 

Let Y (e^"^) = [Zy) (e^"^) = K (e*^) X (e^'^). By the definitions, it follows that Y (e^"^) = 
k (e^"^) X (e*'") = Y{Zy) (e^"^). 

Further, let p = 2 if r = 2 and p = 1 if r = +00. 

We have that ||y (e^"^) - Y {e^^) ||^^(„^_^) =h+ h, where 

h= [ \Y (e*^) - Y (e^"^) \P(ko, h = [ \Y (e^^) - Y (e*^) I'^dw. 

JD(q) JD+(a) 

By the assumptions, there exists c > such that ||X (e*"^) h{ijj, c)||L^(_7r ,r) < +cxd. Hence 

ll'' = \\Y (e-) - Y (e-) = ||(K (e^) - (e-))X||i^(^(,)) 

< ||(F(e-) -l)M^,c)-i||i^(^(„))||K(e-)X(e-)/i(o;,c)||i^(_,^) 

< (2arccos(a))i/''||X (e*-) c)||i^(_^,^). 

The last inequality holds by Lemma[T](iv). It follows that /i ^ as 7 — )• +00. 

Let us estimate I2. Lemma[T](iii) gives that \V (e**^) — 1| < 1 for all w G L>_(_(a). We have that 

h = [ \K (e-) {1-V (e-))X (e-) ^do; < ^(t)!!^ (e^") 

JD+(a) 

where 

V^(7) = / \K (e-) (1 - y (e-))rd^ = T (e^) (1 - (e-))^^... 

J D+{a) J-n 

Here I denotes the indicator function. 

By LemmaEii), Io^^^){uj)\K (e*'^) {1-V {e''^))\P a.e. as 7 ^ +00. By Lemma (Hiii), 

I,,^(„)M|K(e-)(l-y(e-))r < sup |i^(e-)r<L 

From Lebesgue Dominance Theorem, it follows that ^(7) — > as 7 — > +00. It follows that Ii + /2 ^ 
for any c > 0, x G Af(c). By the definition of p, we have that 1/p + 1/r = 1. Hence ||y — y\\£^ — )• as 
7 — )• +00 for any x £ X. This completes the proof of statement (i). 
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Let us prove statement (ii). We have that 

\\Y (e^-) - Y (e^-) f^^^^^^^ =h + h< (2arccos(a) + V^(7))||X (e^^) co)||^^(_^^) 

< 2 arccos(a) + V'(7) 

for any x € Z//(co). For any e > 0, one can select 7 such that ^"(7) < £''/2 and that 2 arccos(Q) < eP/2. 
This choice ensures that \\y — y\\£^ < e. This completes the proof of statement (ii). It follows that the 
predicting kernels ) = Z~'^K are such as required. This completes the proof of Theorem[T] □ 

It can be noted that the choice of predicting kernels is not unique. In particular, the kernels preserve 
the properties described in Theorem [J for any selection of a = 0(7) such that © holds. For instance, 
0(7) can be selected as 

a = 0(7) = 1 — (log 7) 7 1-9 . 

In addition, it follows from the proofs that the uniform predictability from statement (ii) of Theorem [T] 
can be ensured with 

1 JL. 

a = 0(7) = 1-2 (27/co)'-'' • 

In this case, pip) = {2j/cQy^^''~^\ This corresponds to the case where ^ = 1 in Theorem [Uiii). 



IV On the prediction error generated by noise contamination 

Let us estimate the prediction error for the case when the predictor designed for processes from X is 
applied to a process with a small high-frequency noise contamination. Let us consider a process x(-) S 
ioo such that X = xq + Xj^, where xq £ X, Xj^ £ l^o- The process Xjv represents the noise. Let X = Zx, 
Xq = Zxq, and Xjv = Zxf^. We assume that Xq (e"^) G Li(— 7r,7r) and ||Xjv (e*'^) = v. 

The parameter 1/ > represents the intensity of the noise. 

Assume that the predictor is constructed as in Theorem [T] under the hypothesis that 1/ = (i.e, that 
Xjv = and x € X). For an arbitrarily small e > 0, there exists 7 such that, if the hypothesis that 1/ = 
is correct, then 

r \{k (e*^) - K (e*^))X (e^'^) \duj = r \{k (e^'^) - K (e^'^))Xo {e'^) |dw < 2^e 

J —TV J —TV 

and 

Wy-yh^ < e. 

Let us estimate the prediction error for the case where > 0. We have that 

\\y-y\\£^ < Jq + Jn, 
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where 

The value J^r represents the additional error caused by the presence of unexpected high-frequency noise 
(when V > 0). It follows that 

\\y-y\\£^<e + iy{K + i), (4) 

where k = sup^g[_^_^] \K (e*"^) |. 

Therefore, it can be concluded that the prediction is robust with respect to noise contamination for 
any given e. On the other hand, if e ^ then 7 — )• +00 and k — > +00. In this case, error (01) is 
increasing for any given u > 0. This happens when the predictor is targeting too small a size of the error 
for the processes from X, i.e., under the assumption that u = 0. 

The equations describing the dependence of e and k on 7 could be derived similarly to estimates in 
||3l . Section 6, where it was done for different predicting kernels and for band-limited processes. We 
leave it for future research. 

V Concluding remarks 

(i) Technically, the predictors obtained above require the past values of x{s) for all s G (— oo,t]. 
However, X]s=-oo ~ s)x{s) can be approximated by Yfs=-M ^(^ ~ s)x{s) for a large enough 
M > 0. Therefore, the predictors are robust with respect to replacing the semi-infinite time 
interval of observations by a large finite one. 

(ii) For processes from X, the selection of a large enough 7 can ensure that the prediction error is 
arbitrarily small. However, the corresponding predictor transfer function K is large in norm for 
large 7. This leads to a large error caused by any noise contamination presented for processes 
x{-) ^ X, i.e., with the energy on higher frequencies that is not decaying fast enough near the 
point z = e*'^. Nevertheless, the suggested predictors are robust with respect to the contamination 
noise for any fixed 7. The error generated by the noise is limited if k in (|4]| is limited, i.e., if 7 is 
limited and e in (01) is not too small. 

(iii) The presence of robustness mentioned above leads to the conclusion that certain interval type 
predictability for band-limited processes is not an abnormahty that disappears with the presence 
of arbitrarily small noise or some incompleteness of historical data. This predictability holds 
even for processes with a certain rate of energy decay on higher frequencies that are not exactly 
band-limited. This must be taken into account for all models where band-limited processes are 
assumed. In particular, this implies that band-limited processes should be used with caution in the 
models where predictability is difficult to justify such as models for financial time series. 
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(iv) The results of this paper can be apphed to discrete time stationary random Gaussian processes with 
the spectral densities cj) such that f^^ (p (e*"^) h{uj, c)duj < +00, i.e., when the spectral density is 
decaying fast enough on the higher frequencies. By Szego-Kolmogorov Theorem, it was known in 
principle that the minimal (optimal) predicting error is zero in this case. However, it was unknown 
how to construct the corresponding predictors for general classes of 0. 

(v) The restrictions on the spectral representations imposed on the underlaying processes are quite 
tight. They are not satisfied for the samples generated from autoregressive stochastic models 
including ARMA, FARIMA, FEXP and other common long-memory processes (see, e.g., |fT2l). 
It is yet unclear in which applications the processes with the required rate of energy decay on 
higher frequencies could be found. Therefore, it is unclear where to find real data sets to test 
the suggested predictors. We may suggest applying the corresponding predictors for band-limited 
processes such as described in ifTOl . We leave it for future research. 
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